|
Rajesh Jayaram
Email: rkjayaram (at) google (dot) com
|
I am a Research Scientist at Google NYC in the Algorithms and Optimization Group. I received my PhD in computer science at Carnegie Mellon in the summer of 2021, where I was fortunate to be advised by David Woodruff. Prior to that, I received my bachelor's from Brown University in May of 2017.
Research: I am interested primarily in sublinear algorithms and high-dimensional geometry, specifically sketching, streaming, and distributed algorithms for large scale computational problems. More broadly, I am interested in dimensionality reduction methods: namely, to what extent can we compress the significant components of an enormous, noisy data-set? My work also spans the areas of property testing, machine learning, and optimization.
Google Scholar, DBLP.
Teaching:
I taught as an Adjunct Professor at NYU's Tandon School of Engineering.
- Spring 2022: NYU CS-GY 6763 - Algorithmic Machine Learning and Data Science
Workshops:
Dissertation:
Preprints:
- CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
With João Veneroso, Jinmeng Rao, Gustavo Hernández Ábrego, Majid Hadian, and Daniel Cer. [arXiv]
Publications:
- Approximating High-Dimensional Earth Mover's Distance as Fast as Closest Pair
With Lorenzo Beretta, Vincent Cohen-Addad, and Erik Waingarten.
FOCS 2025
- Metric Embeddings Beyond Bi-Lipschitz Distortion via Sherali-Adams
With Ainesh Bakshi, Vincent Cohen-Addad, Sameul B. Hopkins, and Silvio Lattanzi.
COLT 2025 [arXiv]
- Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
With Jie Gao, Benedikt Kolbe, Shay Sapir, Chris Schwiegelshohn, Sandeep Silwal, and Erik Waingarten.
ICML 2025 [arXiv]
- Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
With Laxman Dhulipala, Lars Gottesbüren, and Jakub Lacki.
VLDB 2025 [arXiv]
- Near-Optimal Spectral Density Estimation via Explicit and
Implicit Deflation
With Rajarshi Bhattacharjee, Cameron Musco, Christopher Musco, and Archan Ray.
SODA 2025 [arXiv]
- Massively Parallel Minimum Spanning Tree in General Metric
Spaces
With Amir Azarmehr, Soheil Behnezhad, Jakub Łącki, Vahab Mirrokni, and Peilin Zhong.
SODA 2025 [arXiv]
- MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
With Laxman Dhulipala, Majid Hadian, Jason Lee, and Vahab Mirrokni.
NeurIPS 2024 [arXiv], [Google Research Blog], [Weaviate Podcast]
- Efficient Centroid-Linkage Clustering
With MohammadHossein Bateni, Laxman Dhulipala, Willem Fletcher, Kishen N Gowda, D Ellis Hershkowitz and Jakub Łącki.
NeurIPS 2024 [arXiv]
- Metric Clustering and MST with Strong and Weak Distance Oracles
With MohammadHossein Bateni, Prathamesh Dharangutte, and Chen Wang.
COLT 2024 [arXiv]
- Parallel and Sequential Hardness of Hierarchical Graph Clustering
With Mohammad Hossein Bateni, Laxman Dhulipala, Kishen Gowda, D Ellis Hershkowitz, and Jakub Lacki.
ICALP 2024 [arXiv]
- Dynamic PageRank: Algorithms and Lower Bounds
With Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski.
ICALP 2024 [arXiv]
- Data-Dependent LSH for the Earth Mover’s Distance
With Erik Waingarten and Tian Zhang.
STOC 2024 [arXiv] [Talk]
- HyperAttention: Long-Context Attention in Near-Linear Time
With Insu Han, Amin Karbasi, Vahab Mirrokni, David Woodruff, and Amir Zandieh.
ICLR 2024 [arXiv]
- Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
With Vahab Mirrokni, Shyam Narayanan, and Peilin Zhong.
SODA 2024 [arXiv]
- Fully Dynamic Consistent k-Center Clustering
With Christoph Grunau, Bernhard Haeupler, Jakub Łącki, and Václav Rozhoň.
SODA 2024 [arXiv]
- Streaming Algorithms with Few State Changes
With David Woodruff and Samson Zhou.
PODS 2024 [arXiv]
- A Near-Linear Time Algorithm for the Chamfer Distance
With Ainesh Bakshi, Piotr Indyk, Sandeep Silwal, and Erik Waingarten.
NeurIPS 2023 [arXiv]
- Streaming Euclidean MST to a Constant Factor
With Vincent Cohen-Addad, Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2023 [arXiv]
- Optimal Fully Dynamic k-Centers Clustering
With MohammadHossein Bateni, Hossein Esfandiari, and Vahab Mirrokni.
SODA 2023 [arXiv]
[Merged Paper] with Hendrik Fichtenberger, Monika Henzinger, and Andreas Wiese
- Differentially Oblivious Relational Database Operators
With Lianke Qin, Elaine Shi, Zhao Song, Danyang Zhuo, Shumo Chu.
VLDB 2023 [arXiv]
- Stars: Tera-Scale Graph Building for Clustering and Learning
With CJ Carey, Jonathan Halcrow, Vahab Mirrokni, Warren Schudy, and Peilin Zhong.
NeurIPS 2022 [arXiv]
- New Streaming Algorithms for High Dimensional EMD and MST
With Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2022 [arXiv], [Talk]
- Truly Perfect Samplers for Data Streams and Sliding Windows
With David Woodruff and Samson Zhou.
PODS 2022 [arXiv]
- An Optimal Algorithm for Triangle Counting in a Stream
With John Kallaugher.
APPROX 2021 [arXiv]
- Learning and Testing Junta Distributions with Subcube Conditioning
With Xi Chen, Amit Levi, and Erik Waingarten.
COLT 2021 [arXiv]
- In-Database Regression in Input Sparsity Time
With Alireza Samadian, David Woodruff, and Peng Ye.
ICML 2021 [arXiv]
- When is Approximate Counting for Conjunctive Queries Tractable?
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros.
STOC 2021 [arXiv]
- Testing Positive Semi-Definiteness via Random Submatrices
With Ainesh Bakshi and Nadiia Chepurko.
FOCS 2020 [arXiv], [Extended Abstract], [Talk @ WOLA'20], [Longer Talk]
- A Framework for Adversarially Robust Streaming Algorithms
With Omri Ben-Eliezer, David Woodruff, and Eylon Yogev.
PODS 2020 and Journal of the ACM [arXiv]
PODS Best Paper Award, 2020
Invited to the Journal of the ACM
2021 ACM SIGMOD Research Highlight Award
Invited to HALG 2021
- Span Recovery for Deep Neural Networks with Applications to Input Obfuscation
With Qiuyi Zhang and David Woodruff.
ICLR 2020 [arXiv], [Short Talk]
- Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
With Huaian Diao, Zhao Song, Wen Sun, and David Woodruff.
NeurIPS 2019 [arXiv] [Poster]
- Towards Optimal Moment Estimation in Streaming and Distributed Models
With David Woodruff.
APPROX 2019 [arXiv]
- Learning Two Layer Rectified Neural Networks in Polynomial Time
With Ainesh Bakshi and David Woodruff.
COLT 2019 [arXiv], [Talk @ COLT]
- Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros.
PODS 2019 and Journal of the ACM [arXiv], [Talk], [SIGMOD Technical Perspective]
PODS Best Paper Award, 2019
Invited to the Journal of the ACM
2021 ACM SIGMOD Research Highlight Award
- Weighted Reservoir Sampling from Distributed Streams
With Gokarna Sharma, Srikanta Tirthapura, and David P. Woodruff.
PODS 2019 [arXiv]
- Perfect L_p Sampling in a Data Stream
With David Woodruff.
FOCS 2018 and SIAM Journal on Computing [arXiv]
- Data Streams with Bounded Deletions
With David Woodruff.
PODS 2018 [arXiv]
- Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
With Barna Saha.
ICALP 2017 [Full Version], [Conference Version]